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Abstract 


1  Summary  of  Research  Achievements 

This  project  developed  analytical  and  computational  tools  to  design  integrated  sensor-control 
systems,  where  the  controller  is  part  of  the  sensor  and  designed  so  as  to  maximize  task-specific 
information.  Within  this  broad  umbrella,  we  have  focused  on  visual  sensors  (EO  imagery), 
inertial  sensors  (accelerometers  and  gyrometers)  and  ranging  sensors  (structured  light),  and 
their  integration  in  support  of  mobility  task  (exploration)  and  decisions  (detection,  localiza¬ 
tion,  recognition,  categorization  of  objects  and  scenes).  The  task  informs  what  part  of  the 
data-formation  process  is  a  nuisance ,  i.e.  it  is  irrelevant  to  the  task  but  nevertheless  affects 
the  data.  Obviously,  the  resulting  sensor-control  system  depends  on  the  data  and  it  depends 
on  the  task.  We  have  focused  on  tasks  that  require  invariance  or  co-variance  to  illumination 
and  to  vantage  point.  Then  the  control  reduces  to  mobility  of  the  sensor  platform,  so  as 
to  overcome  occlusion  or  scaling  limitation  in  the  passive  version  of  the  sensor.  Therefore, 
the  actuation,  control,  and  sensing  systems  are  collectively  considered  an  active  sensor,  and 
algorithms  for  inference,  planning  and  control  can  be  co-designed  so  as  to  achieve  maximum 
uncertainty  reduction  in  the  task,  or  maximum  actionable  information  [11]. 

1.1  Modeling  the  agent 

The  first  ingredient  to  establish  an  active  remote  sensor  is  the  ability  to  move,  which  requires 
the  ability  to  localize  the  sensor  platform,  or  agent ,  relative  to  the  surrounding  environment. 
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Thus,  inferring  (causally,  and  in  real-time)  a  representation  of  the  environment  and  the 
agent’s  location  (position  and  orientation  relative  to  it)  is  a  key  enabler  and  a  fundamental 
and  classical  problem  in  a  number  of  fields,  ft  is  well  known  [10],  for  instance,  that  pose 
(a  trajectory  in  the  Lie  group  SE( 3))  can  be  inferred  up  to  a  spatial  similarity  transforma¬ 
tion  by  a  monocular  EO  sensor,  under  suitable  conditions.  However,  knowledge  of  scale  is 
essential  for  interaction,  so  EO-only  approaches  are  not  suitable  for  active  sensing  control. 
Traditionally,  inertial  navigation  provides  scaled-estimates  of  pose,  but  without  reference  to 
the  surrounding  environment  and  based  on  a  doubly-integrated  non-observable  model  that 
yields  diverging  error  dynamics.  Thus  our  first  accomplishment  is  to  study  the  integration  of 
visual  and  inertial  sensing,  and  the  development  of  what  we  believe  to  be  the  most  advanced 
platform  for  visual-inertial  fusion:  [9]  shows  feasibility,  [15]  shows  flexibility,  [14]  shows  ro¬ 
bustness.  Part  of  this  work  will  be  presented  at  the  next  1CR.A  (International  Conference 
on  Robotics  and  Automation)  where  the  first  is  short-listed  for  Best  Conference  Paper.  The 
critical  element  of  this  work  is  its  focus  on  robustness ,  for  what  we  have  shown  [12]  is  that 
most  of  visual  data  is  useless  for  most  tasks,  and  therefore  one  can  expect  -  as  indeed  hap¬ 
pens  -  that  most  of  the  data  consists  of  outlier  measurements.  LInlike  traditional  filtering 
stemming  from  the  field  of  robust  statistics,  in  the  scenario  of  interest  it  is  typical  to  have  a 
majority  of  outlier  measurements.  It  has  been  necessary,  therefore,  to  revisit  classical  robust 
filtering  to  handle  these  scenarios,  which  has  been  accomplished  during  the  project.  Specific 
accomplishments  in  this  portion  of  the  project  includes: 

•  We  have  shown  that  commonly  used  models  for  visual  inertial  fusion  are  not  observ- 
able/identifiable.  While  they  would  be  identifiable  if  accel  and  gyro  bias  rates  were 
known  or  constant,  in  general  they  are  not.  This  (negative)  result  undermines  much  of 
the  prior  analysis  of  observability  and  identifiability  of  visual-inertial  sensor  fusion. 

•  While  not  observable,  we  have  shown  that  the  indistinguishable  set  of  state  trajectories 
is  bounded,  and  we  have  computed  it  explicitly  as  a  function  of  sensor  characteristics 
and  motion  statistics. 

•  We  have  used  the  analysis  to  derive  a  model  for  a  nonlinear  filter  that  is  then  used  to 
converge  to  a  state  in  a  set,  and  we  have  bounds  for  said  set. 

•  We  have  designed  an  outlier  rejection  algorithm  based  on  a  finite  whiteness  test  (Box- 
Ljung)  computed  on  a  temporal  sliding  window,  and  a  causal  smoothing  scheme  to  sup¬ 
port  its  computation,  which  is  shown  to  approximate  the  optimal  (Neymann-Pearson) 
discriminant. 

•  We  have  demonstrated  the  system  live  at  CVPR,  benchmarked  against  Google  Tango  - 
a  project  that  benefits  from  corporate  backing  and  over  20  engineers  working  full-time 
on  it  for  over  2  years  -  outperforming  it  despite  a  single  graduate  student  effort. 

•  The  paper  is  a  finalist  for  Best  Conference  Paper  at  the  next  ICRA. 
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1.2  Modeling  the  scene 

Localization  is  only  the  first  step  to  enable  spatial  interaction  and  decision  tasks  concerning 
the  scene.  The  representation  of  the  underlying  environment  sufficient  to  support  localization 
is  typically  a  sparse  point  cloud.  This  is  clearly  insufficient  for  most  other  tasks  that  require 
at  least  the  topology  of  the  scene  to  determine  what  surfaces  or  “objects”  are  neighbors.  For 
instance,  for  navigation,  it  is  vital  to  know  whether  the  empty  space  between  two  points  is 
occupied  by  their  supporting  surfaces,  or  whether  it  is  empty  space,  as  in  the  latter  case  it 
is  traversable,  in  the  former  it  is  not. 

To  this  end,  we  have  developed  methods  for  topology  estimation  and  regularization,  as 
well  as  coupling  between  location  estimation  and  coarse  geometry:  [4]  couples  the  two,  [3] 
uses  technique  for  range  imagery,  [7]  develops  robust  methods  for  densihcation  and  recon¬ 
struction,  [8]  regularizes  with  the  structure  tensor.  Furthermore,  [5]  develops  first  second- 
order  method  for  geometric  inverse  problems. 

As  part  of  this  effort,  we  have  performed  analysis  and  design  of  co-variant  detec¬ 
tors  and  their  associated  invariant  descriptors  (low-level,  local  descriptors  [6])  and 
dynamic  scene  analysis  [13],  leveraging  work  on  occlusion  detection  [1,  2],  Specific 
achievements  include: 

•  We  have  shown  that  surface  topology  and  geometry  can  be  computed  without  minimal 
surface  bias,  yielding  water-tight  surfaces  and  accurate  volume  measurements.  These 
have  been  used  by  neuroscientists  to  study  the  perceptual  bias  in  the  relation  between 
size  and  weight  of  objects. 

•  We  have  developed  novel  regularizes  that  respect  the  surface  geometry  without  the 
need  to  know  their  topology,  exploiting  instead  the  (trivial)  image  topology.  This 
means  that  we  can  run  dense  reconstruction  in  real  time. 

•  We  have  developed  the  first  second-order  (Newton-like)  optimization  scheme  on  geo¬ 
metric  shape  spaces. 

•  We  have  leveraged  on  prior  work  on  occlusion  detection  to  develop  scene  partition 
schemes  that  can  account  for  individual  objects’  motion  and  relative  occlusion,  while 
maintaining  persistent  tracking. 

1.3  Controlling  the  sensor 

In  [16]  we  have  shown  how  the  controller  can  be  part  of  the  sensor  and  collectively  make  a 
system  “the  best  sensor  it  can  be” ,  in  the  sense  of  controlling  the  data  acquisition  process  so 
as  to  minimize  task  uncertainty.  While  in  the  early  phases  of  the  project  this  construction 
was  restricted  to  cartoon  two-dimensional  objects,  as  the  ensuing  optimization  problem 
becomes  quickly  intractable  and  in  any  case  beyond  real-time  low-latency  implementation 
suitable  to  closing  the  loop,  during  the  latter  phase  of  the  project  we  have  developed  efficient 
computational  approximations  based  on  extended  using  Poisson  sampling  that  have  enabled 
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us  not  only  to  extend  it  to  3D  but  also  to  allow  non-compact  domains,  essential  in  remote 
sensing  such  as  vision  and  ranging.  Specific  achievements  include 

•  We  have  extended  the  optimal  exploratory  control  work,  previously  developed,  to  non¬ 
compact  domains.  In  order  to  compute  uncertainty  reduction,  one  needs  to  compute 
the  “probability  of  visibility,”  which  is  the  probability  of  sensing  portions  of  the  scene 
that  are  occluded.  This  requires  a  prior.  If  the  domain  is  not  compact,  the  uncertainty 
is  infinite,  and  therefore  the  uncertainty  reduction  (information)  is  not  defined.  We 
have  developed  a  method  based  on  Poisson  Sampling  that  makes  this  sound  mathe¬ 
matically,  and  efficient  to  compute,  with  Poisson- Voronoi  partitions. 

•  We  have  tested  Poisson- Voronoi  based  planning  and  (pseudo-)optimal  control  in  sim¬ 
ulated  environments  in  2D  and  3D 

While  all  the  milestones  foreseen  in  the  original  proposal  have  been  met,  new  paths 
forward  have  opened  during  the  investigation.  Specifically,  now  that  the  formalization  of  the 
problem  of  maximizing  “actionable  information”  has  been  done,  there  remains  the  need  to 
derive  tractable  approximations  that  come  with  some  kind  of  performance  guarantee.  The 
work  on  Poisson- Voronoi  sampling  is  one  such  example,  but  much  more  work  is  needed  to 
extend  this  work  to  more  complex  tasks,  and  to  higher  level  of  abstractions  of  the  scene, 
where  the  interaction  and  control  is  not  only  based  on  geometry  and  topology,  but  on  the 
semantics  of  the  scene,  that  is  its  partition  into  objects  and  the  description  of  their  relations. 

This  is  part  of  future  work  that  we  intend  to  commence  now. 
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(exploration)  and  decisions  (detection,  localiza-  tion,  recognition,  categorization  of  objects  and  scenes). 
The  task  informs  what  part  of  the  data-formation  process  is  a  nuisance,  i.e.  it  is  irrelevant  to  the  task  but 
nevertheless  affects  the  data.  Obviously,  the  resulting  sensor-control  system  depends  on  the  data  and  it 
depends  on  the  task.  We  have  focused  on  tasks  that  require  invariance  or  co-variance  to  illumination  and 
to  vantage  point.  Then  the  control  reduces  to  mobility  of  the  sensor  platform,  so  as  to  overcome  occlusion 
or  scaling  limitation  in  the  passive  version  of  the  sensor.  Therefore,  the  actuation,  control,  and  sensing 
systems  are  collectively  considered  an  active  sensor,  and  algorithms  for  inference,  planning  and  control 
can  be  co-designed  so  as  to  achieve  maximum  uncertainty  reduction  in  the  task,  or  maximum  actionable 
information  [1 1]. 
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